perm filename VV.DOC[VV,BGB] blob sn#135775 filedate 1974-12-17 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00018 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	TITLE PAGE
C00005 00003	1. INTRODUCTION
C00012 00004	2. Verification Vision System Design.
C00015 00005	Proposed task continued
C00017 00006	TRAINING ... step thru the procedure given above ... interactively
C00025 00007	(two hidden line drawing with occlusion).
C00027 00008	(overlay tolerance box on picture)
C00030 00009	(wide angle stereo picture)
C00031 00010	(diagram showing x,y,z change)
C00033 00011	2. Verification Vision System Design
C00036 00012	3.1 Prediction by simulation - BGB
C00037 00013	3.2 Prediction by training - RCB
C00039 00014	4. Comparison
C00041 00015	5. Correction.
C00042 00016	6. Application of Verification Vision
C00046 00017	7. Conclusion
C00047 00018	8. References.
C00048 ENDMK
C⊗;
TITLE PAGE

VERIFICATION VISION

Bruce G. Baumgart
Robert C. Bolles

Abtract:
Main points (for abstract):
	A system organization for VV ... steps, models and tolerances
	Automatic prediction of `features' ... including curves
	Training ... including the location and description of curves
	Location and comparison ... simple `fixed' strategy ... use 1 to find 2nd
	Correction ... mathematics for transforms, 2-D ↔ 3-D etc. 

Contents:
	1. Introduction.
	2. Verification Vision System Design.
		2.1 VV Mandala
		2.2 Visual Representations
		2.3 Other Rresearch
	3. Prediction.
		3.1 Prediction by simulation - BGB
		3.2 Prediction by training - RCB
	4. Comparison - RCB
		Comparison by correlation
		Comparison by feature elements.
	5. Correction - BGB
		Correcting the camera model.
		Correcting the world model.
	6. Application of Verification Vision.
	7. Conclusions.
	8. References.
1. INTRODUCTION

Introduction
	Definition ... not yes/no ... not recognition
		... predict ... compare ... correct
	covers (1) hypoth and test (2) stereo (3) pred environment ...
	example ... screw in hole (monocular) 
	    =>  (1) predictable w/i tolerances ... impt. tolerances
		(2) many types of info ...
	in the past there have been some "special purpose" vision hacks (pump)
		want to "systemetize and automate" as much as possible
		... many applications: automation, cart, ... (pump paper for back)
		... also we believe there exists a sufficiently
			interesting set of low-level operators 
	Our approach to paper ... "theory and grand acheme" plus history section 2
		... sections on what we have done ... trying to keep speculation
		to a min ... and then in the conclusion come back to the grand
		scheme and bring things together ... and explore the next
		extensions and their potential (& difficulty)

	Verification vision  is the  process of synchronizing  visual
prediction  with   visual  perception.    Such  verification  may  be
performed on several levels of abstraction: 2-D images, 3-D models as
well as on semantic descriptions.   However in our recent work
on vision for  a robot factory  worker, predicted images
can be obtained which  are nearly identical to the  perceived images.
In such  a case, the verification  is done for the  sake of measuring
small geometric differences which are  expected but which can not  be
rapidly measured by other means.  That is the identities of the image
elements are not in question, only their precise relative positions.  
	
	Verification vision  also includes "hypothesis  and test" where
a predicted line at  a certain span of
location, orientation and contrast is compared
with a line from a perceived image
as in [FALK] and [SHIRAI].

	Finally, verification vision includes narrow angle
correlation stereo. In this case the "prediction" is another image of  the
same objects,  but taken from  a slightly different  relative position. 
The goal is  to locate matching "features" (such as correlation patches)
in order to provide the stereo  package with two positions for the  same
part of the scene (see [Thomas]).   Notice, as mentioned above, that the
identities of the models (ie. the line and correlation patch) are not in
question; only their positions in the actual image. 

(programmable assembly system).

  Such systems provide
complex, but predictable environments consisting of objects with curved,
textured surfaces.  There have been a few special-purpose programs which
perform  verification vision  tasks within  such environments (eg.   see
[BOLLES], [ROSSOL],  ...), but  there have been  no generalized  systems
which predict  and locate curved objects.   Garvey and Agin  at SRI have
each set up systems which deal  with real objects, but only  periferally
concerned with shapes. 

        In  this paper  we present a design <BE CONCISE !!
"organizational structure" is too many letters>
for verification vision and describe a
system which has carried out the  task of visually locating a bolt  hole
in a  brake assembly and  visually servoing a bolt  into the hole.
The brake assembly's
initial location was known to within  plus or minus 10mm for
both X and Y,  and plus or minus  10 degrees rotation about  is center. 
The location of  the hole involved predicting and  locating curves.  The
servoing is done in a stop-and-go fashion.  That is, the arm is moved and
stopped, a pair of stereo pictures were taken, a relative arm correction
was computed, and the arm is moved again. 

         The  next  section  describes  our  theory of  computer  vision
and  shows how  verification vision  fits  into this  theory.
That  section also  characaterizes  some of  other  the previous  vision
research.   The following sections use the task mentioned above to guide
the description of the current implementation of our verification vision
system. 

2. Verification Vision System Design.
2.1 VV Mandala
2.2 Visual Representations
2.3 Other Rresearch

"TASK" 
   STEPS TO BE TAKEN FOR EACH SUBASSEMBLY (during a production run)

(Actual and Synthetic pictures  of the task)

use prediction to			have the arm pick
locate the hole				   up a BOLT 
with CAMERA 1					|
      |
      ↓						|
compute an estimate
for the 3-D change				|
from the prediction (& SUPPORT)
      |						|
      ↓						
use CAMERA 1's estimate				|
to locate the hole
with CAMERA 2					|
      |
      ↓						|
compute the 3-D
change from predicted				|
to actual (using stereo
to compute 3-D location)		    /

		\			/

		    use 3-D change to
		    correct the destination
		    of the bolt
			   |
			   ↓
		    have the arm move the
		    bolt to this position
			   | ← _________________________________
			   ↓					↑
		    use best hole position to			|
		    locate the bolt with
			CAMERA 1				|
			   |
			   ↓					|
		    use arm's Z to compute
		    an estimate for the				|
(same here)			    3-D position of the bolt
			   |					|
			   ↓
		    use this estimate to			|
		    locate the bolt with 
			CAMERA 2				|
			   |
			   ↓					|
		    use stereo to compute
		    3-D location and correction			|
		    for the next arm move
			   |					|
			   ↓
		    move the arm appropriately			|
		    if force sensors indicate that
		    the bolt hit the side of the hole,		|
		    stop ... if vision indicates that
		    the screw is in the	hole			|
		    (by 3-D position, occlusion,...),
		    stop   |					|
			   |					|
			   ↓____________________________________↑


Proposed task continued

(Actual picture of brake subassembly and bolt)
(Predicted picture of brake subassembly and bolt)

MODELLING, TRAINING, CALIBRATION, AND PROGRAMMING THE STRATEGY

	GEOMED models of the brake subassembly and bolt
including curves (possibly just circles projecting
into 2-D ellipses ... want to be able to find out
the 3-D points that correspond to the points in the
2-D projection ... this would mean that one could
find out what points belong to a curve and simply
fit a curve thru them ...) ... these models also
should have photometric info (to generate synthetic
pictures and roughly estimate the contrast across
edges etc.) ... seems possible to automatically generate
the WALTZ labelling for all of the lines and curves if
they aren't too complicated ... this would be a help
for the "characterization" stage in "training"

**** NEED CURVE EXTENSION TO GEOMED ... IS THERE CURRENTLY
PROVISION FOR PHOTOMETRIC INFOR??? HOW ABOUT THE
2-D REFERENCE BACK TO THE CORRESPONDING 3-D POINT???
TO DO THE WALTZ LABELLING NEED TO KNOW WHY A LINE
APPEARS IN A 2-D DRAWING (CONCAVE EDGE, CONVEX,
ONE SURFACE OCCLUDING ANOTHER, A SHADOW, CRACK, ETC.)
IS THAT TYPE OF INFO RETRIEVABLE??? ****
TRAINING ... step thru the procedure given above ... interactively
	keeping the internal model of what is going on in 
	synchonization with the actual situation (as viewed by
	the cameras and monitored by the arm)

TRAINING essentially consists of using the models to
	predict what will be seen, taking pictures to get what
	is actually seen, and updating (extending) the model
	so that it makes better predictions ...

TRAINING (potentially) produces a number of things:
	actual pictures of an example assembly
**** FOR VIDEO COMPARE (CORRELATION) ****
final calibration of the two cameras with respect
				    to each other and the work station
(compare syn w/ act.)		final photometric calibration (light levels etc.)
				    **** ACTUAL USE MAY BE LIMITED TO 
				    RANGE OF CONTRAST, ETC. ****
				characterizations of the features, eg. the contrast
				    across an edge, the confidence of finding the
				    best correlation for a certain patch, etc.
				    **** FOR TOPOLOGICAL COMPARE ****
(diagram showing 		estimates as to how accurate the implications
implied position of		    are which reduce the tolerances between where
curve and reduction		    a feature is expected and where it might be
of tolerances)			    (eg. how beneficial is it to have an edge
				    point on curve 6 ... what reduction in
				    tolerances can be made) ... also should point
				    out any possible confusing edges, correlations,
				    etc.
				    **** TOLERANCES ARE IMPORTANT FOR DETERMINING
				    WHICH TECHNIQUES TO USE ... FOR OBJECT POSITION,
				    CAMERA POSITION, LIGHT LEVELS, POSSIBLE
				    OCCLUSIONS, ETC. ... THE SYSTEM WILL PROBABLY
				    USE ONLY RECTANGLES (IN 2-D) TO REPRESENT
				    THE TOTAL ALLOWABLE FLUCTUATION ... TAYLOR
				    HAS SOME FANCIER THINGS WHICH MAY BE USEFUL
				    ... OR AT LEAST POSSIBLY PRETTY ENOUGH TO SHOW
				    IN A DIAGRAM OR TWO ...****

		    LOCATE THE HOLE WITH CAMERA 1
			Position the subassembly at (X0,Y0) and aim camera 1
			as desired ... to a "known" position

(pic showing		Use these positions to produce the expected view (using
overlay of pred.	hidden line elimination, curves, etc. to first produce
on actual)		a line drawing ... then a synthetic picture ... and
			finally as much of the Waltz-like information as possible)
			
			Use this expected view (mosaic +) to automatically
			locate the desired features (possibly altering the expected
			curves or the portion in the 3-D model which projects into
			that curve ???) and extract the characterization of the
(maybe diag		features ... probably will have to be interactive as
showing			opposed to completely automatic ... however, since training
adjustment)		is only done once, it seems ok if more time is required to
			to do large searches to find the features ... hopefully
			the information  gained will reduce the amount of this
			searching at run-time.
			**** ADJUSTING COMES IN TWO FORMS, AT LEAST, (1) MOVING
			AROUND IN THE 2-D PICTURE TO FIND THE APPROPRIATE MATCHING
			POINT AND (2) MODIFYING THE RELATIVE TRANSFORM BETWEEN
			THE CAMERA AND THE SUBASSEMBLY ... ESSENTIALLY THE IRON-
			TRIANGLE WORK ****
(diagram with possible features ranked by cost/benefit).
		At this point the system could roughly rank the features
		according to (1) how easily they can be found (eg. large
		and with contrast) and (2) how beneficial it would be to
		find it (eg. what reduction in tolerances might be
		expected)

			**** CURVES COULD BE RANKED BY LENGTH AND CONTRAST 
			PLUS THEIR CURVATURE ... THE MORE CURVATURE, THE BETTER
			IMPLICATIONS ONE CAN MAKE ABOUT WHERE YOU ARE ON THE CURVE,
			CORRELATIONS BY SIZE AND THE DISTINCTIVENESS OF THEIR
			AUTOCORRELATIONS (OR WHATEVER)  ... REALLY ONLY USED
			TO GIVE THE PROGRAMMER HELPFUL HINTS AS TO THE GOODNESS
			OF THE VARIOUS FEATURES ... AND AS DEMO OF A STEP TO
			COME IN AUTOMATIC STRATEGY PRODUCTION ****

(diagram from thesis)

			Having located the various features (a number of which
			will be correlation points) the `iron triangle' method
			can be used to determine the transform between camera 1
			and the subassembly ... possibly a version of this 
			`calibration' could be set up which takes more than
			three matching points ... ie. overdetermined system
			**** WHAT IS THE STATE OF THE "IRON TRIANGLE" METHOD?
			IS THERE ANY REASON TO TRY A FIT-WHEN-OVERDETERMINED
			VERSION OF IT???   ANY IDEA HOW ACCURATE IT IS??? ****

			The same training steps can be taken for camera 2 ... 
(two hidden line drawing with occlusion).

			So far, only one position of the subassembly has been
			considered.  In order to write a program to locate the
			hole anywhere within the allowable tolerances (on X, Y,
			and the rotation about the center Z vector), the system
			should "look at" the various possibilities and make sure
			that a sufficient number of the features will be visible
			etc.  We currently assume that the none of the features
			change significantly ... ie. the shadows don't change
			to interfere with the visual location of features, features
			are not obscured by other parts of the subassembly, etc.
			If such things were possible, the model for the expected
			scene could include explicit alternatives for the
			distinctly different appearances of the object.
(overlay tolerance box on picture)

			Eventually it would be desireable to have the system
			capable of automatically generating a strategy for locating
			the hole (or whatever is desired).  This would be done
			by simulating the various positions within the tolerances
			and deciding which features can be used to answer
			which questions about the objects location.  So far,
			the various visual location programs have been interactively
			set up to include a fixed sequence of checks.  Depending
			upon the initial tolerances, various techniques are used
			(eg. the hole location might use a couple of curve location
			steps because the total displacement may be large ... the
			bolt location may only use correlation because the
			tolerances at that point are (hopefully) very small).

			Our system should at least be able to display the possible
			positions (in a picture) for any point of the object.
			This is crucial for deciding upon the strategy.

			**** SEEMS TO BE ESSENTIALLY PUTTING A BOX AROUND THE
			2-D PROJECTIONS OF THE EXTREME POSITIONS ALLOWED WITHIN
			THE TOLERANCES ... EXTREME POSITIONS MAY NOT BE COMPLETELY
			HONEST AND RECTANGLES ARE CERTAINLY NOT GENERAL ENOUGH
			TO TAKE ADVANTAGE OF ALL OF THE INFORMATION, BUT I THINK
			THE IDEA IS CLEAR AND USEFUL ****
(wide angle stereo picture)

			To recap:  features will be located in the two pictures,
			matched, and their 3-D position computed.  These 3-D
			positions will be used to compute the transform from
			the planned position to the actual position. 

			**** DO YOU HAVE ROUTINES TO COMPUTE THE 3-D LOCATION
			GIVEN TWO POSITIONS WITHIN 2-D PICTURES ... IE. TO FIND
			THE TWO RAYS IN SPACE AND `INTERSECT' THEM OR AT LEAST
			FIND THE POINT OF CLOSEST APPROACH ... A LA SOBEL??? OR
			OTHERS??? ****
(diagram showing x,y,z change)

		LOCATE THE BOLT
			The same process can used to set up the program for
			locating the bolt.  Remember that there are two distinct
			steps possible (1) locate the bolt while it is poised
			over the hole (the vision is not as time critical since
			the bolt is not moving) and (2) track and servo the
			bolt in the hole ... very time critical ... our system
			might attempt this ??? ... or do things stop and go???
			Stereo is important at this stage because there isn't
			the support hypothesis to determine the actual 3-D 
			positions from 2-D picture location (as there was
for locating the hole).  There is, of course, the arm's
measurement of Z, but 1→6mm off in Z makes quite a change
in X and Y because of the angle of the cameras ... 

			**** DO YOU REALLY INTEND TO DYNAMICALLY SERVO THE
			BLOODY ARM ???  IT CERTAINLY SEEMS FEASIBLE IF THE
			LOCATION OF THE BOLT CAN BE DONE BY A FEW CORRELATIONS
			OR SOME SUCH THING ... THERE ARE REAL DYNAMIC PROBLEMS
			THOUGH ... EG. HOW TO GIVE DELTA CHANGES TO THE ARM
			ESPECIALLY SINCE ANY CORRECTION WILL HAVE TO INCLUDE
			A PREDICTION OF WHERE THE ARM PROGRESSED TO WHILE THE
			MACHINE WAS TRYING TO FIND THE BOLT IN THE PICTURE 
			... ****
2. Verification Vision System Design

VV Overall SYSTEM Organization 
(vision in general ... some related work...)

2.1 "Vision Mandala"

Notice this is indepedent of control structure ... top-down and bottom-up,
	VV is by definition, top-down
	DV (descriptive vision) is by definition, bottom-up
	... roughly characteristics which determine top-down vs. bottom-up
Elements of vision representation
	2-D image rep:
		"raw data": video, depth
		"raw feature pictures": edge, contour, ...
		"interpreted features": lines, corners, curves,
	3-D image rep:
		geometric, space, ... good grief
		surface photometry ...
		physics (support)
	special task rep:


point out how others fit into this scheme ... Roberts, Falk, Waltz, Krakaur
	ROBERTS ... parameterized models ... pic, edge, lines & polygons,
		topol match to model, pick "best" transform from model to
		data, uses support to determine final 3-D position
	regions GARVEY ... & SRI PROGRESS REPORT ... 
		YAKIMOVSKY, AND LIEBERMAN
	correlation QUAM ... MARSHA JO
	blocks GUZMAN FALK WALTZ GRAPE GILL PERKINS PERKINS
	contours KRAKAUR BAUMGART
	hidden line ... WATKINS ...
	graphics ... GOURAUD, (latest Utah) 

Fit VV in ... point out levels possible ... give some tradeoffs and reasons
    for dealing at each level ...
Fit in a "grand scheme" and then show "actual scheme ... in pieces"

Task accomplished ... in pieces

purpose ... demonstrate ...  relationship  to the "grand scheme" Need
to describe  Stanford's system to put the  various existing pieces of
the system in perspective (so to speak) diagram of steps


3.1 Prediction by simulation - BGB

Prediction
	Goal: predict view (eventually whole movie) ... maybe just beginning for
		interactive system 
	model ... 3-D geometric + photometry (GEOMED)
	hidden line elim => mosaic with photometry, links to 3-D, and "descriptions"
	"descriptions" like Waltz ...
	example: circle with obscuring plane in front of it ... approx by lines,
		show "labelling" and info given to characterizer ... with why and
		how ...



3.2 Prediction by training - RCB

Training

Goal: "second calibration step" of the models (geometric, photometric,...)
	... the first step is  the initial model ... a third step might
	be the "calibration" from one picture of a sequence to the next
	(eg. following the bolt into the hole ... slightly different
	for each assembly)
logically is another VV problem, but one-shot so less time-dependent
	ie. it uses prediction, comparison, and correction 
	the corrections are different ... updating camera vs. object pos
	Another distinction:  almost necessarily interactive  to
	insure the  correct points (features) are matched up ...
	"under a teacher's eye"
	... described here, because this is its position within a task
Benefits of training:
4. Comparison

Goal of compare: match points of model with points in picture (or features
		more generally)
Currently sort of "fixed" strategy ... use big curves until narrowed down
	tolerances well enough to use expensive correlations
	model used ... and dynamically changes as comparison progresses
Manual override if confusing curves possible, etc. ... not very good ..alt
with curves ... cost/benefit idea
	costs used ... cost of edge op, correl, #expected, etc. benefit?
Conservative ... works like ... step thru example
Model for curves is 2-D (in image)... for correl is too, but wrt to table
future automatic strategies (spec)
5. Correction.
		Correcting the camera model.
		Correcting the world model.
Correction
	Goal of correction: determine a improved estimate for an objects position
		could be relative to some other object (as in our case: bolt tip
		wrt to the hole) or "sort of absolute" (ie. wrt workstation coords)
	model ... stereo ... relative change for arm
6. Application of Verification Vision

DISCLAIMER:

The task  was designed to  point out the  various types  of knowledge
available  and to demonstrate  a system design  which is sufficiently
general to take  advantage of  such knowledge.   In particular,  this
task was not designed to be the "best" way of accomplishing the task,
but  rather as A way  with an available  hardware configuration.  For
example, narrow angle  stereo is probably  more generally useful  for
this  type of verification,  because fewer  features change  from one
view to the next. 

GOAL: INSERT THE BOLT(S) INTO THE BRAKE SUBASSEMBLY
(actual picture of setup)

"SITUATION" (assumptions listed: general→specific)
	programmable assembly environment ... means that there are
		cameras, arms, vises, lights, etc. under computer
		control ... which in turn means that the environment
		is predictable (eg. the lighting)
	one arm ... well calibrated, absolute (within 6.0mm) and repeatable
		(within 1.5mm)
	two cameras ... well calibrated aspect ratio, AR, (within ****)
		and focal ratio, FR, (within ****) ... plus roughly
		calibrated work station → camera transform (within ****)
	lighting ... located at position(s) ... and fixed
	bolt dispensor ... in a fixed location and able to dispense bolts
		within tolerances 1mm x 1mm x .1mm .......... which means
		that the arm (using the repeatability tolerance, can
		pick up a bolt within 2.5 mm etc.

**** CAN FAKE IT ... JUST PUT THE BOLT IN THE HAND ****

brake subassembly ... upright, positioned at (X0,Y0) (satisfying
	the constraints: -10mm ≤ (actual - X0) ≤ +10mm and
	-10mm ≤ (actual Y - Y0) ≤ +10mm ... and the rotation about
	its center is +- 10 degrees) ... these are realistic
	tolerances resulting from a UNIMATE placing the subassembly
	at the desired position at the workstation



Automatic assembly
Cart

	one way of looking at this is that a cart with a map of the 
road, plus possibly contours, has to do more "revelation"
vision, but as it progresses, it can do verification
vision ... training could be a previous trip along the same
road ... in some sense the relative motion problems are
different (screwdriver ... camera stays still, screwdriver
moves ... with the cart ... the world stays still (more
or less) and the camera moves ... )
a smart cart (everyone ought to have one) should also do recognition
visions ... for cars, cross streets, ...


7. Conclusion

	future, future, future, ... I see & I see => I am  ⊃ I am (sort of)
    Future:
	Fancier features & tolerances ... eg. auto correl pred from 3-D, Waltz...
	Fancier automatic location of features ... confidence level, 3-D compare mod
	Fancier automatic strategy development ...2-D, 3-D, tolerance simul
		... modelling relative motion
	Fancier math for arbitrary axis of rotation ... 

8. References.

Baumgart

Bolles

Falk

Roberts